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A. BACKGROUND 


The two dimensional scatter plot has been hailed by nany 
statisticians as being the single most powerful tool used in 
Experoratcry Gata dnajysis, [Ref. 1}. A scatter plot fres- 
ents an entire data set in a compact, unambiguous and easily 
understandable format, in which either: 

1. the points lie ina nearly straight line; 

2. the points almost lie on a smooth curve; 

3. the points are scattered without any apparent corre- 
lation between the X variables and the Y variables; 

4. the points lie somewhere between (1) or (2) and (3); 

59. mest of the points lie near a straight line or smooth 
curve but a few outliers are separated from the rest. 
(Pkeies2 | 


These patterns or other hidden peculiarities are much easier 
to discover during a brief glimpse at a well prépared 
scatter plot than during an examination of a data table. For 
example, the strong pesitive correlation between total users 
and active users logged on to the W.R. Church computer 
systen, prgupe 1.1, is more easily discerned from the 
plotted points than from the tabulated data?. This isa 
good exarple of case (1), described above. 

Not cnly does this plot point out the positive trend in 
the data, it also demonstrates that it is nearly linear ani 
provides a rough estimate of the relationship between the 


variakles. 


1 The takle in Figure 1.1 contains aoe a small portion of 
the 472 data pointS included in the plot. A complete listing 
of the data set takes geeets maces y wo pages of text and 1s 
not required for demcnstration purposes. 
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Figure 1.1 Comparison of Data Presentation Methods. 


More precise mathematical expressions and confirmatory 
procedures, including goodness of fit measures, can be 
obtained ty employing classical regression analysis tech- 
niques, a logical enhancement of simple scatter flots, 
Ergure leeZe Numerical guantifications such as the Pearson 
product moment correlation also provide summaries Eut can be 
ambiguous if not acccmpanied by other information, [Ref. 1 , 
pa? |. 

Scatter plots are not invulnerable to misinterpretation. 
When the scatter of fcints falls into category (4) or (5), 
as in Figure 1.3, it may not be possible to judge the true 
relationship between the variables during a cuick glance at 
the scatter plot, although there obviously is some relaticn- 
ship. Figure 1.3 contains a plot of Chewiirsty 200 poimesmoe 
test set two (Appendix C) which is used in Chapter III, 
secticn 2 to test LCWESS* abaiaty (om followsonrue penance 
in curvature. 
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80 100 


ACTIVE USERS 
40 60 





TOTAL USERS 
Y = ~0.22476 + 0.64118 = x 


Baek flierz Linear Least Squares Regression of 
ctive Users on Total Users Logged on to the 
g 
We-R. Church Computer Systen. 





x 


Figure 1.3 Scatter Plot of the First 200 Points 
of Test Set Two. 


Initial inspecticn of this data suggests the presence of 
a guadratic type pattern. This impression leads naturally to 
using the guadratic least syuares regression line of Figure 
1.4 to describe the dependence of Y¥ on X. The accompanying 
analysis of variance table lends some support to this 
choice, Since r2 = .709. 

A closer examination of this data reveals, however, that 


although it locks guadratic, the actual dependence of Y on X 


eZ 





x 
Y= +/C x X*012 WHERE: C = -0.26565 0.54139 -0 013564 


ANALYSIS OF VARIANCE TABLE 


SOURCE Ss oF MS F 
GRAND UEAN (SEE NOTE) 2215.056 { 
REGRESSICN 523.637 2 261.818 239 551 
RESTOUAL 215.312 197 1.093 
TOTAL 2954.005 200 14.770 


THE SIGNIFICANCE LEVEL OF RFGRESSION = .0000 
(SIGNIFICANCE LEVEL = AREA UNDER CURVE BEYOND COMPUTED Ff) 
R SQUARE (SEE NOTE) =  .709 


« NOTE: IN WEIGHTED CASE, SEE DESCRIPTION FOR MEANING 


Figure 1.4 Quadratic Regression on the First 200 
Points of Test Set Two. 


is not described quite that simply. Figure 1.5 demonstrates 
this point very clearly. Splitting the data set into three 
parts at what appear to be logical break points, (x= 1077 2 See 
and fitting a linear least Squares regression line to each, 
shows that Y1isS not a Single functionyoG X over eeventire 
Pali Ca eee ee there appear to be three separate linear 
trends in this data. 

Analyses of this type are seldom undertaken because of 
the tedium involved in selecting appropriate splitting 
points cnce it has tEeen determined that doing so may be 
helpful. 

How then, can ananalyst discover the existence of 
subtle trends or define the shape of unusual patterns 
contained in a scatter plot? §he answer “1s womeuce migeal 


smoothing procedures rather than global (regression) fitting 
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Y = 2.9692 + 0.10504 « x Y = 14.146 + -0.55146 = Xx 


Y = 0051489 + O359619 = x 





D4 
¥ 


Figure 1.5 Linear Regressions on FPirst 200 Points of 
Test Set Two Split at X = 10 and 25. 


technigues. Using a flexible smoothing procedure that 
responds to local changes in the data structure allows the 
data itself to determine the shape of the final curve, as 
epposed to the classical approach of fitting polyncmials 
which have predetermined shapes. 

The kobust Locally Weighted Regression and Scatterplot 
Smoothing (LOWESS) procedure, [Ref. 3], described in the 
remainder of this faper, is a very good method for 
preventing the acceptance of assumptions like the one that 
led to using the quadratic model in Figure 1.4. The LCWESS 
smoothing technique applied to this data, the right hand 
Flot of Figure 1.6, shows very clearly, that the dependence 
of Y on X resembles a combination of three distinct linear 
functions (the parameter F=.25 will be explained later). 
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The LOWESS smoothing frocess has a tendency to round angular 
corners. The straight lines in the center of each segment 
suggest linear trends similar to those contained in Figure 
Meroe 

The major problem with trying to use polynomials to 
depict subtle trends cr to describe unusual relationshifs in 
a data set, is that they are neither flexible nor local. By 
way of example, the foints on either extreme of the first of 
the twe fplots in Figure 1.6, have a siysaificant a@iccr on 
the middle of the fitted polynomials. 


QUADRATIC REGRESSION LOWESS F = .2§ 





fae ee : Compa 


ic Regression and LOWESS 
Bootie (F 


Oints of Test Set Two. 


ae 


The LOWESS procedure on the other hand, allows the data 
points themselves to determine the shape of the smocthed 
curve. Figure 1.6 also demonstrates that global pclynomial 
regressions have a more difficult time following atrupe 


pattern changes than do local smoothing procedures. 


B. SCOPE 


Locally Weighted Regressicn and Scatterplot Smoothing 
(LOWE SS) se. introduced by William S. Cleveland in 1977, 


[Ref. 3], is a generalized extension of the locally fitted 
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polyncmial smoothing techniques used for many years in the 
field of time series! analysis. 


The essential idea behind the simplest of these clas- 
sical smoothing technigues is the following. If the data 


points (Xi,Yi) come from an additive model of the forn 
Y= GQC) 4 GF 
. . 2 . 
where E(€i) = 0 and Var(€i) =O and G(Xi) can be approxi- 


mated locally, over the interval i-m,...i,i+1,...ito, by the 
linear function 


Y, = Bo(X,) + BY(X,) * x,+ €, 


then averaging the Yi over this range yields 


M 
te 
1 ee ol Ve a 


J=-M 





where 


E(Y,) = B(X,) + B(x) xx, + €, 





VAR(Y,) =VAR(€,) = rer 


Taner assumptron that the €@ dre umeorrelated is true, then 
this mucving average frocess produces estimated ce that are 
unbiased and have smaller variance than the raw Yi'ts. This 
technigue makes it easier to distinguish G(Xi) through the 


noise (€i). Using a Eandwidth, M, larger than the interval 


1A time series 1S a_Sequence of random variables Yi which 
are naturally ordered y time (1) and scan therefcme be 
presented as a scatter plot of Yi versus 1. Although i is 
usually the integers, missing values can occur. 
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over which the linearity assumption holds, Wil 2Ptroduce 
tias into the results. [Ref. 4] 

The purpose of this thesis is to translate the generali- 
Zaeron Of classical smooting technigues proposed by 
Cleveland [Ref. 3], and expounded upon by Chambers et al 
(Ref. 1], into user friendly computer programs available for 
use as exploratory data analysis tools by students and 
faculty cf the Naval Postgraduate School. 

LCWESS, written in APL, an acronym for "A PROGRAMMING 
LANGUAGE," was designed to be used alone or in conjunction 
with the IBM GRAFSTAT Statistical graphics packace. 
GRAFSTAT, an experimental program, currently under develop- 
ment by the IBM Watsen Reaearch Center, 1s available at the 
Naval Postgraduate School for test and evaluation purfoses 
PREG. All graphs contained in this paper were produced 
Ey the GENERAL PLOT function of the GRAFSTAT progran. 

LCWS, a modification of LOWESS, when used in conjunction 
with GRAFSTAT and expanded versions of the DRAFTSMAN DISPLAY 
programs described in [Ref. 6], enhances an already fowerful 
exploratory data analysis package. 

A FCRIRAN versicn of the basic LOWESS program was 
designed to be used in conjunction with either DISPLA 
[Ref. 7], or any other W.R. Church computer system supported 
graphing package. 

These programs are interactive and can tLe used easiiy by 
individuals who have littie or no APL or FORTRAN programming 
skills. Users who are well versed in these languages skould 
Fe akle tc modify them to frovide tailor made outputs, 
expand their capabilities or incorporate them into ctker 
analysis packages. 

Detailed user instructions are contained in Charters Iv. 
and V while examples of their use are presented in Chapter 
Tire Users who are interested in the mathematical details 
of Robust Locally Weighted Regression and Scatterplot 
Smoothing should read Chapter II. 


17 


A. OVERVIEW 


mocalily Weighted Regression Seca ttemel or Smoothing 
(LOWESS), 1S a generalized extension of locally fitted joly- 
homial smoothing techniques used by many statisticians in 
time series analysis }. Unlike its predecessors, however, 


LOWESS was designed to work on unegually as well as eguaily 
Spaced X*s. It also contains a robust fitting procedure 
that guards against possible distortion of the smocthed 
curve by outlier points. The general procedure usec by 
Cleveland is an adaptation of iterated least squares regres- 
Sion technigues developed by Albert Beaton and John Tukey 
Meer. 8 }. 

The cverall objective of LOWESS, like most Stoothing or 
regression routines, is to compute a "fitted" value, Y, that 
depicts the middle of the empirical distributicn of Yat 
each X. Unfortunately, most data sets do not contain enough 
repeated observations at each X to provide a good estinate 
of the middle of this distribution. LOWESS derives its esti- 
mate cf from the equation of a weighted least squares 
regressicn line fitted to a set of data ovoints whose i 
values are located ina user defined neighborhood about X1 


(X value of the point being smoothed). 


E. MATHEMATICAL DETAILS: NON-ROBUST LOWESS SMOOTHING 


The first step in generating a LOWESS smocthed foint 
consists of forming a neighborhood, Figure 2.1, centered 


around Xi and comprised of its Q nearest neighbors. The user 


1A brief theoretical explanation of these technigues was 
presented in Chapter I. 
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determines Q by choosing the Pawaveter F, whieheroecepre..] 
mately egual to the percentage of the number of data points 
used in computing each fitted value. Q is (F x N) rounded to 
the nearest integer, and the Q nearest neighbors are those 
points whose X values are closest to Xi. Note that there 
are not necessarily an equal number of neighborhood points 
on €ither side of Xi. Also, Xl is considered to Le a 
neighktor of itself. The parameters F and Q, determined 
prior to smoothing the first data point, are held censtant 


and used throughout the procedure. 





Figure 2.1 Vertical Strip Containing the 10 Nearest 
Weighbors of X6 in Data Set Two. 


In Figure 2.1, the point to be smoothed, X6, is high- 
lighted Ey a dotted line and the strip boundaries are delin- 
eated by solid lines fassing through X1 and X10. 

STEP TWO consists of defining the local weighting func- 
tion and calculating individual weights for each point, 
(Xk, Yk), in the strip formed during STEP ONE. This weighting 
function is to be centered at Xi and scaled so that it hits 
zero for the first time at the oth nearest neighbcr cf Ki 
(the strip boundary furthest from Xi). Functions havirg the 


following properties will satisfy these requirements: 
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ieee (uy > 0 for [UJ] < 1 (positivity), 


2- W(-U) = (D0) (Symmetry), 
Sap  W(Oy 2s a2 nomhincreasing function for u> 0, 
foe Kio) -= 0 foe Ole >) lie 


Cleveland, [Ref. 3], suggests using a tricube weight func- 
tion of the form: 


3.3 
w(u) = { (1 — lu!) FORMU <1 
O OTHERWISE 
Note that this function uses the absolute value of U. The 


Wetght given to any point within the strip is calculated ky: 


Xi—- XK 


w(U) = W a 


The variable Di is the distance along the X axis from Xi to 


; th ; . . 
its Q7~ nearest neighbor. This is the distance from XE to 


the left hand boundary in Figure 2.1. When LOWESS starts 
its Sen | pass at Xl, the right hand boundary passes 
through its Q~ nearest neighbor, X10 in this example. The 


Herogrleornood which, at that time, contains the points X11... 
Xq remains fixed until the distance (Xi-X1) 1s greater than 
(Xq-Xi). This usually occurs at i = Q/2 for evenly spaced 
data. At this point the neighborhood is advanced and the Q 
nearest neighbor shifts to the left hand boundary where it 
remains until all of the data points have been smoothed. Di 
therefore, is generally the distance from Xi to the right 
hand roundary for 1 = 1...(0/2) and is the distance from Xi 
to the left hand boundary for i = (Q/2)...N. 

The weight given to any point in the strip is equal to 
the height of the ctrve, W(u), at xk, PEGUre 2626 L1his 
figure demonstrates that the tricube weigat function: 


20 


1. gives the largest weight to the point being smoothed; 
2. decreases smoothly as Xk moves away from Xi; 
3. is symmetric akout the point being smoothed; 
Wu. hits zero for the first time at the @~ # £nearest 


neighbor of Xi. 


eee 2 3 
x 


Figure 2.2 TRICUEE Weight Function for the 10 Nearest 
Neighbors of X6 in Data Set Two. 


WEIGHT 
04 a6 O08 10 
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In cases where sé€veral points have abscissas eéqual to 
Xi, all of them are given weight 1. If Di is zero, meaning 
that all Q points in the strip have abscissas egual tc Xi, 
it 1S impossible to estimate the slope of a fitted line. [In 
this instance, a constant egual to the mean Y value for all 
C points’ is fitted te the porn tei. 

STEP THREE uses weighted least sguares regression tc fit 
a polyncmial of degree P to the data points that lie within 
the strip containing Xi. The parameters of the equation 
that describes this line are the values of Bj j = 0,1,...P 
that ginimize: 


Q 


2 
y Wk(U)(Y%k — Bo — BiXk — ... BexXn ) 
K mm | 


Za 


Figure 2.3 shows straight (p=1) and quadratic (p=2) lines 
fit to the neighborhood points surrounding X6 in data set 
EWiO'. 


LINEAR QUADRATIC 





Figuce 2.3 Linear and Quadratic Fits. 


The choice of an appropriate P depends on the user's 
percepticn of the relationship between the points within 
each neighborhood, the need for flexibility to reprcduce 
patterns in the data, and computational ease. The existence 
of physical theories that define the relationships as Eeing 
nonlinear might also influence this choice. Smoothed curves 
based cn higher order polynomial regressions tend to fellow 
abrupt pattern changes better than those based on linear 
models. Cleveland [Ref. 3], feels that computational 
considerations begin to override the need for flexibility 
for values of P greater than 1. 

The smoothing routine written for this thesis is cafatle 
of performing linear cr quadratic regressions. Using p= 1 
or 2 should provide adequately smoothed points for any data 
set. 

The final step in the Locally Weighted Regression 
portion of the LCWESS procedure is the determination cf the 


A 
smoothed point (X1,Yi), Figure 2.4, where: 
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p 

A 

Y= ) B%)-x) 
Jui 


The notation used here emphasizes that the coefficients of 
the Xi are different for each point Xi. 





Figure 2.4 Scatter Plot of Data Set Two Superinposed 
With Smoothed Point (X6,Y6). 


LOWESS differs from most other smoothing routines 
pecause it smooths all of the data points. This becomes 
important when smoothing small data sets, when important 
pattern changes take place near the ends of the data set, or 
when the smoothed curve is to be used aS a regression line 
to predict future trends. Figure 2.5 summarizes the sequence 
of steps described akove, as'they are used tc ccmpute a 
"Fitted" value for (X20,Y20), the Elght bamd end pennt wey 
data set two. 

A comparison of Figures 2.1 and 2.5 reveals that the 
widths of the vertical strips about (X6,Y6) and (X20; 720) 
are not equal. Note that the ten nearest neighbors of X20 
are all to the left. Although both strips contain ten data 


points, the requirement to center them around their 
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SIER @ Step ae 





RESULT 


WEIGHT 





x 


Figure 2.5 Summary of Steps Required Boe pe eng the 
Smoothed Value at (X20,Y20) in Data Set Two. 


respective (X1i,Yi) feints forces the right hand ferticn of 
the weighting function in Pigure 2.5 to fall off-scale. The 
ert hand portion Of the weiyhting £unction for (X1,¥1) is 
forced cff scale fer the same reason. These partial 
Peemerigesince rons ctrl rulfitl ail of the requirements 
outlined earlier, however. Unegual spacing of the X's also 
creates variable strip widths. 

A set of smoothed data points, Figure 2.6, is obtained 
ky cogpleting the afcrementioned steps for each point in the 
criginal data set. 
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Figure 2.6 Plots of Lowess Smoothed Data Points and 
Smootked Curve Superimposed on Data Set Two, (F=.5). 


C. MATHEMATICAL DETAILS: ROBUST LOWESS SMOOTHING 


The robust smoothing feature of LOWESS prevents a small 
number of outliers frem distorting the smoothed curve. The 
point (X10,Y10) an Figure 2-1 1s onessuchecitiecr. 

The robust procedure computes a’ new set of weights for 
each (Xi,Y1) based on the size of the residuals, (Yi=7 ie 
obtained after the first smoothiny pass, Fiyure 2.7. 


Cleveland [Ref. 3], suggests using a bisguare function 
of the form: 


piv) = «(1 -— v7)? FORM < 1 
0 OTHERWISE 


Robustness weights fer each point are calculated by: 


D.(V) = o| Be | 


where M is the median of the absolute value of the resi- 


duals, Figure 2.8. This is sometimes referred to as the 
Median Aksolute Deviation (MAD). 


ee 


RESIDUALS 





Figure 2.7 Residuals (Yi-Yi) Versus Xi for the 
on~Robust Smoothed Points of Data Set Two. 


-2 


Figure 2.8 Robust Menlo Function For the First 
Pass Through Data Set Two. 


This scheme gives small weiyhts to points associated 
with large residuals and large weights to points with small 
residuals. One iteration of the robust locally weighted 


regressicn procedure 1S completed by calculating a new set 


of "fitted" values using the weighting Function 


WT = W(U)xD(V) 


in step three. 
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Execution of the entire TOWESs) algord tim seonciceaigsoe 
one locally weighted regression pass and two robust lecally 


weighted regression fasses produces a robust smoothed curve, 


Figume 2.9. The effect of the "outlier" ¢€an be Seen very 
clearly. 
NON~ROBUST LOWESS F = .5 ROBUST LOWESS F = 5 





Fa gi e (Zeng Comparison of Non-Robust and Robust LOWESS 
Smoothing of Data Set Two, (F=.5). 


Cleveland {Ref. 3], reports that the number of computa- 
tions reguired to complete the LOWESS algorithm on an entire 
data set 1s on the crder of FN2. For example, 60 linear 
regressicns were used to complete the robust smoothing of 
the 20 artificial data points in Figure 2.9. The non-recbust 
curve, on the other hand, reguired 2/3 fewer calculations 
and took less than 1yz the time. The number of calculaticns 
required to produce a smoothed curve presents no significant 
problem for plots of fewer than 100 points. Computational 
time can be saved by grouping the Xi'ts on data sets that 
have repeated X values. This saving results from the fact 
that if Xitil = Xi then vie = Yi. ASSigning the same Yi 
value to each of the Ni repeated Xits reduces the number of 
regressicns required Ey Ni for non-robust smoothing and by 
3Ni for robust smoothing. 


2 


Doe echiCOSING F 


mnerer abe mo Set criteria for Choosing F. Small values 
beoauce= CULVeCS With high resolution and a lot of ncise. 
Larger F's produce curves with low resolution and less 
noise, but reguire increased computational time. In 
general, increasing F tends to produce smoother curves, 
Eoeguce 2.10. Cleveland, [{Ref. 3], Suggests that values 
between .2 and .8 shculd be satisfactory for nost purposes. 
The gcal is to choose the largest F that minimizes the vari- 
ability in the smoothed points without distorting patterns 
in the data. Computational time may become a consideration 
in choosing F when soothing large data sets. In general 


though, F will decrease as the series length increases. 


ROBUST LOWESS F = .2 ROBUST LOWESS F = 3 
@ 





Figure 2.10 Comparison of Robust LOWESS Smoothing of 
Data Set Two for Different Values of F. 
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Smeothing routines, LOWESS included, dO Hocweeeecvide 
regression equations cr other analytical results on which to 
test goodness of fit. The user must judge the adeyuacy of 
the results. The choice of F is not so critical tor cases in 
which the purpose of the smoothing is to enhance the visual 
percerpticn of gross fatterns in the data. For example, the 
rough curve obtained by using F=.2 on data set two, the left 
hand plot of Figure 2.10, provides an adeyuate picture cf an 
overall increasing trend. More care must De taxen in some 
applications, such as time series analysis, or when the 
smoothed (Xi,Yi) values may be used as a type or regression 
Lune elome, OG e6 nd 1 when the smoothed curve may be 
presented without an accompanying plot of the original data 
points. Taking F=.5 is a reasonable choice when there is no 
Clear idea of what is needed, [Ref. 3]. Chambers, [Ref. 1], 
Suggests that it is often wise to try several values of F 
before selecting the "hest" one od particular 
App ication. 

Techniques for determining bandwidth using technicues of 
cross-validation have been considered by Cleveland [Ref. 3], 
and Rice [Ref. 9], but are not included here. 
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III. EVALUATION OF THE LOWESS CURVE SMOOTHING PROGRAM 


A. GENERAL 


Smoothing routines are generally used to filter noisy 
data and approximate underlying relationships that may be 
too complex to describe mathematically or too difficult te 
fit by simple polyncnmial regression. Effective routines 
must be flexible and local. They must allow the data to 
determine the shape cf the smoothed curve and they must be 
able to follow abrupt as well as smooth changes in curva- 
ture. This evaluaticn will test LOWESS in each of these 
areas. 


Be. METHCDOLOGY 


LGwWESS, like most other curve smoothing schenes, 
provides no analytical solutions by which to measure its 
effectiveness. The correctness or adeguacy of the fit must 
ke judged subjectively. And there are no standard guidlines 
to follow. Sometimes the shape of the fit can be checked by 
comparing it to the physical laws that govern tne applica- 
tion at hand. The programs written to support this thesis 
were evaluated by: 

1. examining their performance on a set of test data for 
which the underlying functional relationshifs were 
known; 

2. ccmparing their results with those obtained fron 
widely used and previously validated curve smocthing 
technigues, namely; LEAST SQUARES REGRESSION, MOVING 
AVERAGE and CCSINE ARCH weighted smoothing. 

The theory of moving average procedures dates back to 


definitive studies of discrete time series models completed 
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by H. Wold in the mid 1930's. The general process is tased 
on the assumptions and theories recounted in Chapter I. the 


moving average is defined by the expression 


N 
X(1) =) AZ I= OF 
Ja—-M 


where M andN are nennegative integers and the weighting 
coefficients Aj are real constants. Kendall and S tice 
(Ref. 4), and Koopmans [Ref. 10], present in depth discus- 
Sions and theoretical derivations that expand on the ideas 
presented in Chapter I. The moving averaye routine employed 
in this analysis is contained in the IBM GRAFSTAT statis- 
tical graphics package. The weighting function used in that 


Frogram takes the forn 


Aye J=-M...N 


The COSINE ARCH smoothing procedure used here, is a 


moving average process that uses a cosine weighting function 
of the forn 


ee — cos 24) J=0,1... N-1 
M+] +] 


It is characterized as a good smoother by Ansccnte, 


[Ref. 11), and is often used as a trend remover during time 
series analysis. 


C. TESTING PROCEDURES AND RESULTS 


Three sets of test data were developed to check all 


aspects of the LOWESS program's capabilities; its ability to 
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follow linear trends as well as abrupt and smooth changes in 


curvature. 


test setmuone, Figure 3.1, (Gonsists “of 150 data 


points having the following functional relationship: 


Y = X + NORMAL(O,1) NOISE Os<X<10 


was designed to test LOWESS' ability to detect linear trends 
in noisy data. Althcugh this test appears redundant, many 
complex smoothing prcecedures have failed because they did 
not return straight lines when that was the shape of the 
underlying curve. 





Figure 3.1 Test Set One With and Without N(0,1) Noise. 


The adequacy of LOWESS' performance on test set one 
Was measured by comparing it with a linear least squares 
regression line fitted to the same data. 

As pointed out in CHAPTER II, LOWESS froduces 
increasingly smoother curves as the parameter F approaches 
1. When F=1, each neighborhood used throughout the smoothing 


process contains N e¢1#=N points. This implies that each 
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smoothed point (Xi, Yi) 1s computed from the equation of the 
TRICUBE weighted regression line fitted to all of the data. 
This procedure should produce a LOWESS smoothed curve that 
closely resembles the linear regression of Y on X. The 
TRICURE weighting function used in LOWESS may cause DBinor 
disparities between the two "fits," however. A visual 
inspection of the bottom two plots in Figure 3.2 reveals 
that LOWESS and the linear regression produced nearly 


identical "fits." 


LOWESS Fa.2 LOWESS Fe.§ 





Y # -0,16524 + 1.0143 © x 


Figure 3.2 Comparison of LOWESS Smoothing and Linear 
Regression of Test Set One. 


Goodness of fit can ke measured by examining the 
A 
residuals (Yi-Yi) from each smoothing procedure. A perfect 


reprceduction of. the underlying functional relationship, Y = 
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X, would produce a set of residuals distributed Normal(0, 1), 
the same distribution found in the noise. The results of the 
GRAFSTAT distribution fitting proceedure summarized in Table 
II indicate that the distribution of the regression resi-~ 
duals can be approximated as Normal(0,1.04) while the LOWESS 
residuals are approximately Normal(.002,1.016). 

Hypothesis tests comparing the means and variances. 
of these distributicns with those of the Normal (0,1) 
distributed noise, will provide some measure of the goodness 
of fit of each smocthing scheme. The results of these 
tests, conducted at the 95% confidence level, are summarized 
mo table I. 

Tneme Out Dilton the ~ GRArSTATE distribution fitting 
procedure presented in Table II and the hypothesis tests 
summarized in Table I, suggest that there is no significant 
difference between the distribution of the residuals fron 
the linear regression or LOWESS smoothing of test set one, 
and the Normal(0,1) noise inccrporated into the data. This 
provides strong support for the premise that LOWESS deraicts 
linear trends very well. Visual comparison of the LOWESS 
Srootn= in Figure 3.2 coniirmsSmethateLOWESS follows the same 
general trend regardless of what F is used; small values 


provide rougher curves that have the same general slope. 


TABLE I 


pombers son of the Means and Variances of Residuals 
From Smooths of Test Set One to the Normal(0,1) Noise 


nce1se i 2(1-%72) 6 
1 aeGe DE 


0 
1 accept 
0 
1 


o 
1.3 
Lge, acce pt 
is acce pt 
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TABLE ITI 


Summary of GRAFSTAT Distribution Pitting of 
Residuals from Regression and LOWESS Smooths of Test Set One 


RESIDUALS FROM LINEAR REGRESSION 


NORMAL DISTRIBUTION 


x : BESO 
SELECTION : ALL 
LABEL : RESO 
SAMPLE SIZE: 150 
MINT MUM : ~2.846 
MA ] MUM : 5 15t 
CENSORING ;: NONE 
EST. METHOO: MAXIMM LIKELIHOOO 
SAMPLE FITTED COVARIANCE MATRIX OF 
ME AN : 2.0898€7"14 2.0898€7-14 PARAMETER ESTIMATES 
S'D DEY =< 120293720 1 0295£0 bat S1Gua 
SKEWNESS: 1.1908€71 0 O000EO MU 0 0070189 O 
KURTOSIS: 3.1359€0 3. 0000E0 SIGuw 0 0 003555 
PERCENTILES SAMPLE FITTED GOOONESS OF FIT 
Si “1.7375 ~-1,693860 CHI-SQUARE 2.3078 
10: = 1-336! ~1.3196£0 O€EG FREEO: = 
Zoe “0.59132 76.9409£71 SIGNIF 04805135 
50: "0.032298 1 O399E-? KOLM-SMIRN- : 0 040266 
Pe 0.63234 6.9409E 71 SIGNIF : 0 96816 
90: 1.3208 1.3196£0 CRAMER-V Me: 0.027624 
95: 1.7182 1.6938€0 SIGNIF : > “5 
ANODER-DARL 0 17006 
SIGNIF a1 5 


KS, AD, ANO CV SIGNIF. LEVELS NOT EXACT WITH ESTIMATEO PARAMETERS 


0.95 CONFIOENCE INTERVALS 


PARAMETER ESTIMATE LOATR UPPER 
Mu 2.0898£-14 -O 16424 0.16424 
SIGMA 1.0295£0 0.9247) 1.1613 


RESIDUALS FROM LOWESS SMOOTHING 


NORMAL DISTRIBUTION 


x : LOWESS RES/OUALS 
SELECTION :) ALL 
LABEL > LOWRES 
SAMPLE SIZE: 150 
MIN [MUM : "2.909 
MAY | UM : 3 090 
CENSORING : NONE 
EST. METHOO: MAXIMUM LIKELIHOOO 
SAMPLE FITTED COVARIANCE MATRIX OF 
ME AN : 0 016268 O 016268 PARAMETER ESTIMATES 
STO DEV : 1 0237 1 0237 MU SiGiMAA 
SKEWNESS: 0 093313 O MU 0 0069398 0 
KURTOSIS: 3 1452 3 SIGMA 0 0.0034932 
PERCENTILES SAMPLE FITTED GOOONESS OF FIT 
as —1 6646 -—1.6679 CHI-SQUARE 1.4385 
10: “e550 71.2958 O&G FREEO: 5 
250 “0.55317  -0 6739 SIGNIF 0.92006 
50: 0.010179 O 016268 KOLM-SMIRN : 0 047238 
75: 0.64998 0 70643 SIGNIF : O 89136 
90: 1.2874 1 3284 CRAMER-~Vv M : 0 03063! 
95: 1.7125 1 7005 SIGNIF So tS 
ANOE R-OARL 0 18198 
SIGNIF : StS 


KS, AD. AND CV SIGNIF. LEVELS NOT EXACT WITH ESTIMATED PARAMETERS 


0 95 CONFIOENCE INTERVALS 


PARAMETER ESTIMATE  LOWFR UPPER 
MY 0.016268 -O 14704 © 17958 
S1GMa Vo02357 0 91946 1 1546 
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Test set two, Figure 3.3, consisting of 220 data 


points having the following mathematical relationship 


.4X + NORMAL(0,1) NOISE O<X<10 
yoo 3 + .1X + NORMAL(0,1) NOISE 10<x<25 
14.6 - 3.67X + NORMAL(O,1) NOISE 25<x<40 
O + NORMAL(0O,1) NOISE 40<x<44 


meomused CO test LOWESS* ability to handle abrupt pattern 
changes. The smooth of test set two generated by LOWESS, was 
compared to those prcduced by MOVING AVERAGE and COSINE ARCH 
filtering of the same data. 


2 4 a 


3 


=? 





Figure 3.3 Test Set Two With and Without N(0,1) Noise. 


Determining the amount of smoothing reguired bya 
data set is, perhaps, the most difficult aspect of uSing any 
curve smcothing routine. Smoothness 1S controlled by the 
size of the parameter F in LOWESS and by the parameter M 
(bandwidth) in MOVING AVERAGE and COSINE ARCH smoothing. 
These farameters determine the number of points, or neigh- 
borhocd size, used to compute each smoothed value. The goal, 
regardless of the sgrethod chosen, is to use the largest 
neighborhood that minimizes the variability in the smoothed 
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Foints without distorting patterns in the data. Ancther 
factor that must alse be considered when choosing M, is that 
MOVING AVERAGE and CCSINE ARCH Smoothing routines rreduce 
cnly (N-) smoothed points. Using proportionately large 
values of M, therefore, might result in losing significant 
portions of the original pattern at the ends. This shortcon- 
ming will ke evident in the graphical comparisons made 
throughcut the remainder of this chapter. 

Comparison tests made during phases two and three of 
this evaluation used selected LOWESS smooths and cocrre- 
sponding MOVING AVERAGE and COSINE ARCH smoothed curves. 
Parameters for the three processes are directly convertible 
ky the relationship M = FeN. 

Figure 3.4 presents graphical comparisons of LOWESS 
smooths (solid line) uSing parameter values F = .15,.25,.50 
and .75 to illustrate some of the considerations made during 
the parameter selecticn phase of 
a smoothing operaticn. The exact underlying relationships 
(dashed lines) were included to demonstrate how large values 
of F can cause pattern distortion. 

It is apparent from the seguence of illustrations in 
Figure 3.4, that ICWESS produces smoother curves as F 
increases. The smoothest curves are not aiways the most 
desireable, however. The bottom two curves (F=.50 and F=.75) 
have distorted the original pattern by using too many pcints 
to compute the smoothed values. Test set two contains 50 
points in the segment (0SXS10). Using a neighborhood much 
larger than 220¢.25 = 55 points on this data set would have 
a tendency to fit the wrong slope to the first linear 
segment. Additionally, it would cause over smoothing of the 
corners. Figure 3.5 shows the neighborhood and linear 
regression used to smcoth the point (X10,Y10) during produc- 
tion of the smoothed curve (F=.75) pictured in the lower 
right corner of Figure 3.4. It is easy to see that follcwing 
this slope would distcrt the pattern presented by the data. 


So! 


LOWESS “fh mas LOWESS F = 2 





Figure 3.4 Comparison of LOWESS Smoothing of Test Set Two 
Using Different Values of the Parameter F. 





Figure 3.5 Linear Regression peep in qooeenund (X10, Y10) 
in Test Set Two Using LOWESS With F=.75. 


The F=.15 plot depicted in Figure 3.4, demonstrates 
that small F's create very locally smoothed curves that 
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contain a great deal™ci noise but follow Gross pacer myo, 
well. Using a small F is an excellent idea if the sole 
purpose of the smoothing is to highlight major trends in the 
Gdtedre 

The LOWESS smoothed curve obtained by using F=.25 is 
the one test suited fcr comparison with corresponding MOVING 
AVERAGE and COSINE ARCH smooths, Figure 3.6. 


TEST SET TWO LOWESS F = 2 





Figure 3.6 Cowparison of LOWESS, MOVING AVERAGE 
and COSINE ARCH Ssnmnoothing of Ttest Sset Two. 


Inspection of the plots in Figure 3.6 reveals that 
all of the smoothing frocedures fit similarly shaped curves 
to most of the data. The inability of the MOVING AVERAGE and 
COSINE ARCH routines to smooth the extreme edges of a plot 
precluded them from fitting a curve to the last segment of 


test set two. Practitioners of these routines often extend 
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the curve or fit the ends by eye. Applying these technicues 
to the LFottom curves in Figure 3.6 does not reveal any 
Significant pattern changes. LOWESS, although it does not 
follow the level trend accurately, does reveal a major 
pattern change in the last section of the data. 

All three of the procedures have a tendency to round 
Sharp corners as the rfarameters F and M are increased. The 
MOVING AVERAGE curve, in the lower left, has a very rcunded 
Shape and does not highlight the linear trend in segments 
one or two. The COSINE ARCH filter does a little better. [It 
portrays the linearity of section three with nearly the 
correct slope but fits segments one and two with one smooth 
curve. Additionally, it has added a misleading hump at the 
intersecticn of segments two and three. LOWESS is the only 
procedure that clearly pictures the underlying pattern as a 
series of straight lines. An experienced user whe under- 
Stands that LOWESS recunds corners, could almost duplicate 
the original pattern by connecting the linear porticns of 
the curve. 

Smoothing procedures are not only judged on their 
ability to depict patterns, but are also rated on their 
ability to filter out unwanted noise. Gross differences in 
their capabilities can be picked out easily in a graphical 
compariscn. It is readily apparent that the MOVING AVERAGE 
curve in Figure 3.6 is much noisier that either the LOWESS 
or COSINE ARCH smooths. 

A more analytical measure of a procedure's smocthing 
ability can be made Fy comparing periodograms of the unfil- 
tered and filtered data. A periodogram is an analysis tech- 
higue used to estimate the spectrai density functicn ofa 
time series at periodic frequencies, Nv. The periodcgram 


function is defined fry 
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2 


Vv 





N 
es 7 
Iw = an! LX : 


Refer to Koopmans {Ref. 10], chapter 8, for a detailed 
discussicn of the periodogram and its distributional proper- 


ties. The periodograms in Figure 3.7 provide 


TEST SET TWO WITH NOISE TEST SET TWO WITHOUT NOISE 
3) 





ane Ve odbba outa lu, 


0 a9 rs) 0 40 re) 
FREQUENCY FREQUENCY 
LOWESS F = .2 MOVING AVERAGE M @ 44 
oe © 





0 4Q 80 0 « eo 
FREQUENCY FREQUENCY 
COSINE ARCH M = 44 


o | 40 a 
FREQUENCY 


Figure 3.7 Comparison of Periodograms of LOWESS, MOVING 
AVERAGE and COSINE ARCH Smoothing of Test Set Two. 


compariscns of the filtering properties of each smoothing 


routine. The vertical lines on each plot represent 
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periodicities, the spectral frequencies of which are 
measured along the abscissa. The height of the lines is an 
indicator of the significance of the associated frequencies. 
The plots in Figure 3.7, were truncated at Y = 6 to prevent 
the okscuration of the minor frequencies. 

A visual inspection of these periodograms reveals 
that LOWESS produces the smoothest (most noise free) curve. 
In fact, the periodogram of the LOWESS curve and noise free . 
data are nearly identical. 

All of this evidence supports the conclusion that 
LOWESS performs at least as well on data sets that contain 
abrupt changes in curvature as do the widely accepted MOVING 
AVERAGE and COSINE ARCH procedures. 


3. Ehase Three: Smooth Changes in Curvature 


Test set three, Figure 3.8, comprised of 100 data 


points having the following relationship 


Y = SIN X + NORMAL(O,1) NOISE O<x<2 


was used to evaluate LOWESS*' ability to follow smrocth 
changes in curvature. The same procedures used in the 
preceding section to test LOWESS' ability to handle akrupt 
pattern changes were applied here. 

Test set three appears to either have a negative 
linear trend, or appears to cycle about the line Y = 0. A 
Peer oNOPeEOnnoo | shootiss, Figure 329, Starting with a small 
F parameter, was used to discover the general pattern 
(dashed line) and refine the resulting smoothed curve (solid 
line). The distorted smooth in the lower right hand plot 
demonstrates the inherent danger in selectinjy a large F if 


only cre smoothing pass is planned. 
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TEST SET THREE WITHOUT NOISE TEST SET THREE WITH NOISE 





Figure 3.8 Test Set Three With and Without N(0,1) Noise. 


LOWESS = .15 LOWESS = .25 





arison of LOWESS Smoothing of Test Set 
ifferent Values of the Parameter F. 


Figure 3.9 co8 
Three Using 


The LOWESS curve obtained by using F=.25 provided 


the most smoothing without distorting the pattern and was 
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used in a direct comparison with corresponding MOVING 
AVERAGE and COSINE ARCH smooths, Figure 3.10. The LCWESS 
smooth is the only curve that has the characteristic sinu- 
soidal shape. The MCVING AVERAGE plot, although very ncisy, 
would present the prorer picture if the ends of the curve 
were extended. The radical change in curvature on the left 
end of the COSINE ARCH smoothed curve detracts from its 
'abiliity to represent the true shape of test set three. 





MOVING AVERAGE M = 25 COSINE ARCH M @ 25 





Figure 3.10 Comparison of LORESS, MOVING AVERAGE and 
COSINE ARCH Smoothing of Test Set Three. 


Comparison of the periodograms presented in Figure 
3.11, shows, once again, that LOWESS produces the smoothest 
curve, while Figure 3.10 shows that it seems to follow the 
model the best. 
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Figure 3.11 


3.11 demcnstrate 


TEST SET THREE WITH NOISE 
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TEST SET THREE WITHOUT NOISE 


3 10 20 30 “8 80 
FREQUENCY 


MOVING AVERAGE M = 25 


Comparison of Periodograms of LOWESS, MOVING 


AVERAGE ans COSINE ARCH Smoothing of Test Set three. 


The graphical comparisons made in Figure 3.10 and 


Clearly that LOWESS performs at least as 


aS MOVING AVERAGE and COSINE ARCH routines when 


smoothing data that has a smooth curvilinear pattern. 


GQ. Phase Four: Unequal Spacing 


Besides being able to smooth all of the data points, 
LOWESS enjoys another possible advantage over MOVING AVERAGE 
type procedures, in that it was designed to work on unegual 


as well as equally spaced data. The definition of MCVING 
AVERACES 


M 
me Aree l= 0) 1,2... 
J=a-M 


holds only if the Yi'ts are equally spaced and have a linear 
relationship over the interval’ (i-m) ... (itm). Violaticn of 
the linearity assumftion introduces bias into the results 
while violation of tke equal spacing requirement invalidates 
them. LCWESS would indeed enjoy a distinct advantage over 
MOVING AVERAGE type smoothing procedures if it produces 
acceptable results on irregularly spaced data. 

This section examines LOWESS* ability to smooth two 
different sets of this of type data. The first, natural log 
Sry energy diSSipation versus depth, Figure 3.12, is a trans- 
formed portion of data collected during a turbulence mneas- 
uring experiment conducted by the Department Or 
Oceanography, U.S. Naval Postgraduate School. 

The LOWESS curves obtained by using linear and gquad- 
ratic regressions during Step Three of the smoothing proce- 
dure were compared toa guadratic least squares regression 
line fit to the same data, Figure 3.13 
Higher order regressions were rejected as plausible solu- 
tions because the regression coefficients Bj, j = 3,4,5... 
were found to be statistically insignificant compared to the 
Bj, j = 90,1,2 constants. A quadratic relationship also 


seemed to be a reascnable assumption since turbulence is a 
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LOG ENERGY DrSSIPATION 





Figure 32.12 Natural Log of Energy Dissipation vs Depth. 


QUADRATIC REGRESSION 


-8 -6 


LOG ENERGY DISSIPATION 





16 20 24 28 32 x“ 
DEPTH? IN METERS 


Yo +/C x Xe012 WHERE: C = —12.512 0.33412 -0.0055612 


ANALYSIS OF VARIANCE TABLE 


SOURTE So OF MS F 
GRAND MEAN (SEE NOTE) 10275. 656 1 
REGRESSION 28.970 2 14.485 32.500 
RESIDUAL 73.094 164 . 446 
TOTAL 10377.719 167 62.142 


THE SIGNIFICANCE LEVEL OF REGRESSION = .0000 
(SIGNIFICANCE LEVEL «= AREA UNDER CURVE BEYOND COMPUTED F) 
R SQUARE (SEE NOTE) = 284 


NOTE: IN WEIGHTED CASE, SEE DESCRIPTION FOR MEANING 


FIgGUE Ceo Quadratic Regression and Analysis of Variance 
Table for Ln Energy Dissipation Versus Depth. 
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functicn of pressure which varies in proportion to depth 
squared. 

Figure 3.14 shows that the LOWESS curves (solid 
lines) for the linear (P = 1) smooths follow the general 
guadratic regression (dashed lines) for small values of F 
Eut flatten the pattern for large F's. The quadratic (P = 2) 
LOWESS curves close in on the regression line as F increases 
and produce a fairly good match as F reaches .75. | 

The quadratic LOWESS curve also appears’ to follow 
local peaks and valleys more accurately for small F's than 
does its linear counterpart. This is not unexpected. Figure 
3.15 shows that the characteristically bowed shape or a 
guadratic curve produces larger Yi values in the middle of a 
data set (X1 is located in the middle of the LOWESS neigh- 
borhood) than a straight line fitted to the same data. 

The "fits" of Figure 3.14 can be compared anralyt- 
ically, as was done in the Phase One test, by examining the 
distribution of their residuals. Combining these analytical 
results with graphical comparisons provides some goodness of 
fit measure for the two curves. The nonparametric Smirnov 
two sample test [Ref. 12], iS appropriate in this case 
because the distribution of the residuals 1S unknown. The 
results cf this test conducted at the 95% confidence level, 
Table III, indicate the there is no significant statistical 
difference between the F=.75 guadratic LOWESS curve and the 
guadratic least squares regression line. See the lower right 
hand plot of Figure 3.14 


This example demonstrates that LOWESS works cuite 
well on unequally spaced data. It also shows that guadratic 
LOWESS werks better than the linear model when neighbcrhcod 
sizes are too large to support the assumption that the 
neighborhood points are related linearly. Quadratic LCWESS 
should be used whenever the data suggests that that assuap- 


thrones Not true. 
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ROBUST LOWESS SMOOTHING: ENERGY DISSIPATION DATA 
LINEAR, F = .2 QUADRATIC, F = .2 


=e 


LOG ENERGY DISSIPATION 
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LOG ENERGY OtSSIPANION 











16 20 24 W » 16 20 24 28 mR M 

DEPTH IN METERS DEPTH IN METERS 
LINEAR, F = .5 QUADRATIC, F = .5 

S¢ Ss 

< < 

ie . 

p 6 : 4 

G 

ao © 
87 3 i 
16 20 24 78 32 x 
DEPTH IN METERS DEPTH IN MWEIERS 
LINEAR, F = .75 QUADRATIC, F = .75 

Ss os 

< 

" 

Be © 

: ' : ' 

gs g? 

16 20 24—ti«O28 32 7) 16 20 24 2 32 we 

DEPTH IN METERS DEPTH IN METERS 


ch 3.14 LOWESS SOO tad of Energy Dissipation Data 
Sing Linear and Quadratic Regressions in Step Three. 


The second irregularly shaped plot to be smoothed, a 


lag-1 plot of 200 NEAK(1) random variables, is pictured in 
Figure 3.16 
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Figure 3.15  LOWESS eaoeena ng OL sos En energy Dissipation 
Data Using Linear and Quadratic Regressions in Step Three. 


TABLE III 


. eo Test Comparing the Distribution of 
Residuals from Smcothihg and Regression of Energy Data 


reject 
Be }eCet 
meqec 
Gece ps 





The NEAR(1) process, derived by Lawrence and lewis 
[Ref. 13], is a new first order autoregressive time series 
model with exponentially distributed marginals. NEAR(1) data 
is generated as a Simple linear combination of a series, En, 


of independent exponential random variables by the model 


Xy =|€, + BXyi, WPA N = 0,1,2 ... 
0 WP. (1-A) 
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Figure 3.16 Lag-1 Plot of NEAR(1) Random Variables 
Having Autocorrelation .75. 


These NEAR({(1) variables have some interesting frop- 
erties that make them especially suitable for testing 
smoothing routines. They have fixed serial lag-1 correla- 
tion, P = AB and have conditional expectation 


—1 
E(X, Xp_,=X ] = (1-AB)\ + ABX 
The follcwing parameters were used to generate the variables 
for the test; A=.83, B=.9, N= 1. A successful smooth of 


Figure 3.16 should preduce a straight line of the form 


not at all what one would expect from looking at the plot. 
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Figure 3.17 presents comparison plots of robust and 
non-rebust linear regression and robust and non-rebust 
LOWESS smoothing of the near(1) data of Figure 3.16. The 
robust regression function ccntained in the IBM GRAFSTAT 
package was used in this example. 

Examination of the plots in Figure 3.17 shows, once 
adem, that LOWESS smooths are comparable to those produced 
Ey accepted linear regression technigues. It also reveals 
that neither the linear regression nor LOWESS procedures 
were able to reproduce the true lag-1 relationship, (Y = .25 
+ .75X), shown in the lower right hand plot. Both rebust 
curves do present an accurate picture of where most of the 
data points lie, and could be used to predict where a 
majority of the future points are likely to fall. Relying on 
these curves, however, would probably lead to the conclusion 
that the points abcve and below these lines represent 
outliers, which may cr may not be the case. 

It must be concluded from LOWESS' performance on 
these two data sets, however, that it smooths unegually 
spaced data as well as currently available regression 


technigues. 


De 


LINEAR REGRESSION ROBUST LINEAR REGRESSION 


Y = 0435732 + 0562! « x Yer O1384 + O9O25 x x 
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Figure 3.17 Comparison of Robust and Non-Robust Linear 
Regression and LOWESS Smoothing of the Lag-1 Plot 
Of NEAR(1) Data. 
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IV. USING THE APL YWERSION OF LOWESS 


= == _= ee = =_ = ==> > es eee ee = ee 


Ae OVERVIEW 


This chapter prevides prospective users with detailed 
instructions for using LOWESS as a stand-alone program or in 
combination with the experimental GRAFSTAT graphics package. 
In either pode, LOWESS will provide the user with vectors of 
robust or non-robust smoothed Yi values and their associated 
residuals. When used in conjunction with GRAFSTAT, it will 
also produce a scatter plot of the original data with the 
LOWESS smoothed curve superimposed. A Similar type presenta- 
tion of the absolute value of the residuals versus X1iis 


also available on reguest from the program, Figure 4.1 


NON—ROBUST LOWESS SMOOTHING, F = .7 





: 
n 
* : 
a @ 
ag g 
— 
n 
o : : 
40 80 120 40 60 420 
TOTAL USERS TOTAL USERS 


Figure 4.1 Sample of cee pecs Outputs from LOWESS: 
mooths of the Data (left), and Residuals (right). 


LOWESS is a completely interactive progran. All user 
defined parameters and option selections are entered in 
response to program gueries. The stand-alone and combined 
graphics modes of operation are differentiated only by their 
initial set up procedures and by the choice of terminals on 
which tke program is run. 
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Although no APL programming skills are reguired to 
operate LOWESS, users should become familiar with systen 
commands and procedures for entering the APL envircnnment, 
loading and copying werkspaces and variables and for saving 
workspaces by reading appropriate sections of [Ref. 14]. 
Operating instructions presented in the follow-on secticns 
of this chapter have been written for users who have had 
little or no experience with APL. Experienced users may £ ie 
it more convient to refer to the summarized frocedures 
presented in the Tables at the end of this chapter. 

LCWESS is not a W.R Church computer center supperted 
program and is not included in any of the APL 1likraries 
listed in [Ref. 15]. Interested users should contact 
Professor P.A.W. Lewis, Department of Operations Research, 
U.S. Naval Postgraduate School, for information concerning 
access to the APL workspace DINLFNS. This workspace, wate. 
contains LOWESS and several other data analysis related 


programs, should be ccpied and stored on the user's A disk. 


Be TERMINAL REQUIREMENTS 


LOWESS, in the stand-alone mode can be run on any APL 
Capable terminal at the U. S. Naval Postgraduate School. The 
IBM GFAFSTAT software, which generates the graphical 
displays when operating LOWESS in the combined grarhics 
mode, reguires the use of either IBM 3277GA or 3278/79 
graphics display terminals. The 3278 terminals require 
special modification tc produce graphical displays. None of 
these terminals are available for public use at the Naval 


Postgraduate School. See Table IV for a summary. 


C. PROGRAM INITIALIZATION: STAND-ALONE MODE 


Since LOWESS is written in APL, users must enter the APL 


sub-environment after completing normal log on procedures. 


a) 


This is done by typing the letters "APL" and depressins the 
enter key. The resfonse "CLEAR WS" indicates that the 
computer is ready to accept APL commands. 

APL uses a special character set that is invoked by 
keying the APL ON/OFF key while depressing the ALT key on 
IBM 3278/79 terminals or by merely hitting the APL ON/OFF 
key on the 3277GA graphics display terminals. These special 
APL characters are imprinted in red (3278/79 terminals) or 
Elack (3277GA terminals) on the top and front surfaces of 
the ncermal keys. The symbols located on the front of the 
keys are accessed [ty typing the appropiate key while 
depressing the APL AIT key. When two APL characters are 
pictured on the top surface of the same key, the uppermost 
character is invoked by hitting that key while depressing 
mune SHIFT key, much the same as producing capital letters 
during normal typing cperations. 

The final step in the initialization procedure consists 
of loading LOWESS and associated sub-programs into the 
active APL workspace. This is accomplished by entering the 
system command ") PCOFY DINLFNS LOWESS " 2, This ccnmand 
copies a grcup of pregrams reguired to execute LOWESS. See 
rene 16 ,p.107 ], for ln tornhatLon doueuc “tne APIECRCUP 
command. The computer responds by presenting WS size and 
"date-saved" information when all programs have been loaded. 
Initialization 1s new complete and the user is ready to 
execute LOWESS by typing "LOWESS" and hitting enter. Fron 
this peint on, user enteries are made in response to progran 
gueries or instructicns. Table I summarizes these initiliza- 


tion procedures. 





1 Underscored letters are obtained by typing the desired 
letter while depressing the APL ALT key. 
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D. PROGKEAM INITIALIZATION: COMBINED GRAPHICS MODE 


As neted in Section B of this chapter, the comtined 
LOWESS-GRAFSTAT package can only be run on IBM 3277GA, 3279 
or specially conFigtred 3278 graphics display terminals. 
Additionally, efficient operation of GRAFSTAT requires a 
Minimum workspace size of 2 megabytes. The W.k; Churen 
Computer Center has established a limited number of public 
domain workspaces with special account numbers and fasswords 
to meet this need, [Fef. 5]. Hard copy graphics frinters 
are available for use with the 3277GA terminals located in 
Ingersall, Root and Spanegall Halls. The remainder ct this 
secticn focuses on the use of the 3277GA terminals. 

Data files stored on the user's personal disk are 
unavailable for use while operating in one of the pubiic 
works;faces. Users may: 

1. send files te the public workspace's user number 
pricr to logging on and commencing a work session; 

2. link to his/her own disk after logging on to the 
public workspace useing CP link procedures outlined 
in {Ref. 17}. : 

After logging onto one of the public workspaces and 
completing the data transfer or linking procedures described 
above, the user must enter the APL sub-environment by typing 
"APLGS7"1 and hitting the enter key. The response, “CLEAR 
WS" indicates that the computer is ready to accept APL 
commands. 

The special APL characters, labelled in black, are 
invoked ky depressing the APL ON/OFF key. Since this key 
also turns the APL characters off, it may be necessary to 


check their status by trial and error. Detailed instructicns 


1. The command, "AFIGS7" invokes special svstem routines 
reguizred to support the IBM GRAFSTAT software package. This 
rocedure may change. Contact Professor P.A.¥W. LewiS, 
e ae ee of Operations Research, if these procedures do 
not work. 
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for uSing the APL character set are presented in Section C 
See this Chapter. 

The initializaticn procedure is completed by loading 
GRAFSTAT and LOWESS into the active APL workspace. GRAFSTAT 
should te loaded first, by entering the system ccmmand 
") LOAD GRAFSTAT". The GRAFSTAT package is quite large and 
may take several minutes to load. The following set of user 
lnustructions will apfrear on the screen when GRAFSTAT is 
fully loaded; 


Peeols A NEW {571784%) RELEASE OF GRAFSTAT. IT RUNS ON THE 
3277/GA OR ON THE 327€/79. IT HAS A NUMBER OF NEW FUNCTIONS. 
meok CID CCNTROL VECTCHS WILL WORK AS BEFORE. IF.YCU )CCPY 
eee choeeenAN )ULOAD THES AWORKSPACE YOU MUST. EXECUTE THE 
MmONCTICN ~LATENT BEFCRE STARTING. OC A eet hE AS beh 
Beni LOULED FOR 7/84. 


MO BEGIN, TYPE; START 


Beheenecke INFORMATION, TYPE: DESCRIBE 


It is not necessary for the user to start, or even 
interact with GRAFSTAT to smooth a set of data: the GRAFSTAT 
message may be cleared by depressing the CLEAR key. 

Users who have the APL workspace DINLFNS stored on the 
public workspace disk, or who are linked to their cwn 
personal disk where it is stored, need only enter ") FCOPY 
DINLFNS LOWESS “" to complete the initialization process. The 
computer responds by presenting WS size and date saved 
information when ele. programs have been leaded. 
Initialization is now complete and the user is ready to 
execute LOWESS by typing “LOWESS" and hitting enter. Fron 
this pcint on user enteries are made in response to prcgram 
queries cr instructicns. See Table VI for a summary of these 


procedures. 
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E. OFEFRATION OF LOWESS 


This section provides detailed descriptions of the user 
inputs reguired during normal operation of LOWESS. The 
discussion assumes that one of the initialization procedures 
described in Sections C and D of this chapter has already 
been completed. 

Execution of the LOWESS program is initiated by typing 
N"LOWESS" and hitting the return key. Since the program is 
interactive it will respond with a series of queries ofr 
instructions requesting the user to input data or make deci- 
Sions about the operation of the program. The exact sequence 
cf program initiated gueries and instructions is formulated 
in response to user inputs. 

User-computer interactions required during executicn of 
LOWESS are categorized into two types; data input and 
program operation. 

Since the prograr cannot operate without data, the 
initial concern of LCWESS is to locate and read the data set 
it is about to smooth. Data can be read from the active APL 
workspace, a stored AFL workspace or from a stored CMS file.. 
Data that is not lccated in the active workspace must be 
accessitle from that workspace. This presents no preblen 
when the user iS cperating under his/her personal user 
humber and the data is stored on his/her disk. This may 
become a problem when the user is logged on to one of the 
public workspaces described in Section D of this caharfter, 
and has not: 

1. sent the data to the puklic workspace where ke/she is 
working and stcred it on the assoceated A disk; 

2. ilinked to hisyher own disk prior to entering the APL 
sub-environment, see Section D of this chapter. 

Wherever the data is stored, it MUST be formatted into 


two separate lists, one containing the X values and the 


Se 


other containing the corresponding Y values of the points 
reing smcothed. 

Data which resides in the active workspace as APL 
vectors! is entered into LOWESS when the user types’ the 
variable name and hits enter in response to afrpropriate 
program reguests. 

Data which is stored in another APL workspace on the 
disk in use or on a disk to which the user is linked, will 
be transferred to the active workspace by the sub-progran 
DATAINPUTI. The user needs only to enter the workspace name 
and variable names when reguested. DATAINPUT will also read 
and convert CMS files stored on the disk in use or on a disk 
to which the user is linked, provided they are formatted as 
descrited above and ccntain only numerical data. A mixture 
of alphaketic and numeric characters in a CMS data file will 
create an error and terminate execution of LOWESS. these 
data transfer features will work equally well in either mode 
of operation. The IEM GRAFSTAT program contains functions 
Sremtrecd Cia READ and CMS WRITE that will convert datd in 
both directions when operating in the combined grarhics 
mode. Users will generally not need to use this feature of 
GRAFSTAT, however. 


Program Operation inputs include: 


1. the value of the parameter F (selection considera- 
tions are discussed in Chapter II Section C); 

2- whether robust or non-robust smoothing is desired; 

Sewer nea es OF net a plot of the original data and 


smocthed curve is desired; 


1 In APL, alist of data points stored under a single vari- 
able name is referred to as a vector. See [Ref. 14], for 
further details. 


60 


u. whether or not a plot of the absolute values of the 
residuals and associated smoothed curve is desired; 
5. X and Y axis labels for these plots. 

Plots can only ke generated while operating LOWESS in 
the ccmbined graphics mode. Reguesting plots when GRAFSTAT 
has not been loaded will produce an error and terminate 
executicn. Hard cofies of plots may be obtained by 
depressing the HARD COPY button on the bottom of the 
graphics screen. 


TABLE IV 


Summary of Terminal Requirements and 
Available Outputs 


Stand-Alone Mode Combined Graphics 


Terminal 327/GR S27 07 3279 327 7GA, 32179 Of 327 
Reguired with graphics board 


Additiwena li 
Software 


Reguired IBM GRAPS TAT fen. 


Availatle 


Output erical: 


m 

Mines. Stee tam 
= oo Giga. Tape. 
enor) dial vf 


j : u 
H =. SNCG@tnh ie = 
1 
ESY .. reSiduals 
r 
m 
R 


N 
¥ 
Cisieg dita i 
original Y Y 
-. residuals R 
G 

5 

| 


hacal; 


ap 
ooth curve 
esiduals| vs Xi 





61 





Initializaticn Procedures, 


Objective 


(1) enter APL 
environment 


(2) invoke APL 
characters 


TA BE ow 
Stand-Alone Mode 


USCE I npueSs PEOG@amene sponse 


WwAPLM "CLEAR WS" 


APL ON/OFF key none 


(3) load LOWESS 
and assoc. ) PCOPY DINLFNS "saved (date) (time)" 
programs LOWESS 


TABLE VI 


Initialization Procedures, Combined Graphics 


Objective 


(1) enter APL 
environment 


invoke APL 
characters 


load 
GRAFSTAT 


load 
LOWESS 


execute 





User Inputs Program Response 


"WAPLGS7" MCLEAR WS" 


APL ON/OFF key none 


Milt laliZaciwen 
screen, see p 59 


(date) " 


") LOAD GRAFSTAT" 


") PCOPY DINLFNS "saved (time) 
LOWESS" 


TLOWEoo 
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Ae. OVERVIEW 


This chapter prcvides prospective users with detailed 
instructions for using a FORTRAN program that accomplishes 
the LOWESS curve smccthing procedure described in Chapter 
II. The program, entitled LOWESS, will provide the user with 
CMS files containing robust or non-robust Yi values and 
their asscciated residuals. These data files can be used to 
create ylots of the raw and smoothed data points using 
DISPLA [ Ref. 7}, EASYPLOT, or other W-R- SG@hviech corpite: 
center supported IMSL or NON-2MSL plottanmeg coutines- 

LOWESS is a completely interactive progran. All user 
defined parameters and option selections are entered in 
response to program gueries. 

Although no FORTRAN programming skills are reguired to 
operate LOWESS, users Should become familiar with FORTRAN 
and WATFIV operating system commands and also with the tasic 
XEDIT editor, by reading appropriate sections of [Ref. 18], 
and [Rete Toe A limited ability to format c0 2). aoe 
Manipulate data files will be helpful when using LCWESS or 
when interacting with any of the plotting routines mentioned 


earlier. 


Be. TERMINAL REQUIREMENTS 


LOWESS can be run on any remote terminal attached to the 
IBM computer located at the Naval Postgraduate School. The 
DISPLA and SASYPLOT rflotting routines reguire the use of the 
IBM 3277GA graphics display terminals located in Ingersall, 
Root and Spanegall dHalls. Plotting routines that use the 
remote VERSETEC or line printers can be accessed from any 
terminal. 
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C. PROGKAM INITIALIZATION (FORTRAN VERSION) 


Speer LOVES S 1 Seenot a WR eCGhurch computer center 
Supported progran, Tt Ses Noted Vallablessaneany.. of » the 
center's public access libraries. Interested users should 
contact Professor P.A.W. Lewis, Department of Operations 
Research, U.S. Naval Postgraduate School, for information 
concerning access te LOWESS and its supporting programs. 
Copies of the programs listed in Table VII should be 
obtained and stored cn the user's A disk. Annotated copies 


of the scurce codes are contained in Appendix (B). 


TABLE VII 
Programs and Subroutines oe for the 
Operation and Suppert of the FORTRAN Version of LOWESS 
Filename Filetype Filemode 

LOWESS FORTRAN 

LOWS EXEC 
PXSORT FORTRAN 
LLBQF FROTRAN 


1 
1 
1 
1 





PXSORT and LLBQF are contained in the IMSL library. 
Users having access to these programs through the 4W.R. 
Church ccemputer center need not obtain personal copies. 

The LOWS EXEC is used to activate system lirfraries, 
designate CMS storage space required for LOWESS input and 
output files. It is invoked by typing “LOWS EXEC" and 
hitting the ENTER key. The file definitions contained in the 
LOWS EXEC are listed in Table VIII. See [Ref. 17], for info- 
mation on the use of EXEC executive programs. 

This EXEC defines enough file space to accomodate five 
data sets. The user need only enter the appropriate file 
number when queried by LOWESS, to smooth any of the data 


sets. 


64 


TABLE VIII 
Input and Output File Definitions Used in LOWS 


File number Filenane Filetype 


Z, LOW2 
3 LOW 3 





It may become necessary to change these filénames to 
avoid losing data when smoothing a large number of data sets 
or when smoothing one set a number of times. This may be 
accomplished in one of the following ways: 

1. by entering the CMS ccmmand “XEDIZT LOWS EXEC" Seana 

changing the appropriate names; 

2. ky using the CMS command "R (old filename) (old file- 
type) (old filemode) (new filename) (new filetype) 
(new filemode)" for each file neecing to be changed, 
see [Ref. 18]. 

File management is important. It is absolutely impera- 
tive that data input files have the same filename, filetype 
and filemode listed in the LOWS EXEC to prevent inadvertant 
smoothing of the wrong data or to prevent programming error. 


D. DATA FILES (FORTEAN VERSION) 


LCWESS requires that data be input in two cclunns of 
floating point constants in (2F15.5) format, X values on the 
left and Y values cn the right. This 1s accomplished by 
creating anew file with the command "XEDIT (filenane) 
(filetype)." The filename and filetype chosen should be one 
of these listed in Table VIII or one that is contained in 
the user's owm LOWS EXEC. Refer to [Ref 19], chapecrs2 cc. 
more detailed instruction on creating files. The (2F 15.5) 
format reguires that all input variables contain a deciral 


point followed by ne more than five decimal places. The X 
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values must be entered in the first fifteen spaces and the Y 
values in the second fifteen spaces of each line (one set 
Pewee) . 

The cutput from LCWESS is placed in a file designated by 
the user. This can be the same file usec for inputting the 
(X,Y) values or a different one. A different file should be 
used if the same data set is going to be smoothed with 
several different parameters. Tit sou ut lo erat ntSd San 
(4F15.3) format. The first column is the original X values 
ordered from smallest to largest. Column two contains the 
corresponding Y values, While column three contains’ the 
smoothed Yivalues and column four contains the (Yi-Y1) 


residuals. 


Fe OFERATION OF LOWESS (FORTRAN VERSION) 


This section provides detailed descriptions of the user 
inputs required during normal operation of LOWESS. The 
discussicn assumes that the LOWS EXEC has been properly 
prepared and executed and that input files have been Euilt 
according to instructions presented in Section C of this 
chapter. 

Execution of the LOWESS program Loeiittiated Dy Eyping 
"NWATFIV LOWESS * (XTYEFE". Since the program is interactive, 
it will respond with a series of gueries or instructions 
requesting the user to input data or make decisions about 
the operation of the frogran. 

The initial concern of LOWESS is to locate and read the 
data set it 1s about to smooth. Data can only be read fron 
one of the files defined in the LOWS EXEC routine. The user 
tells LOWESS what file to read by entering the approfriate 
file number (2,3,4,7 or 8) in response to the instruction 
Vol mee beernls NUMBER OF THE INPUT DATA FILE." The program 
will terminate with an error if the LOWS EXEC was not 
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properly prepared or if the data file was not formatted as 
descriked in the preceding section. Other program reguested 
inputs include: 
1. the value of the parameter F (selection considera- 
tions are discussed in Chapter II Section C); 
2. whether or robust or non-robust smoothing is desired; 
3. the file number of the desired output file. 
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APPENDIX A 


APL PROGRAMS 


This Appendix contains annotated listings of the APL 
programs written for this thesis. Source listings cf the 
system library programs used to support the CMSREAD function 
called in the program DATAINPUT are not included. 

LOWESS 1S an interactive program that executes the 
Robust-Locally-Weighted Regression Scatter-Plot Smoothing 
Frocedure described in the preceeding sections of this 
paper. It calls tke following subprograms; DATAINPUT, 
REPEATCK, REGRES, REGRES2 PLOTQUERY and LOWS during execu- 


tion. Refer to Chapter IV for detailed user instructions. 


t2xLOWESS 
[0] LOWESS;N;Q@;WX;J;72;A;B;Q;STRP3;U;Ds;TX;WT;Z2;8R;DA;DB;R;U1;M;RO; 
AR;RHS ;PROCEED;N1;PT;SKP;YS;F sROB; REG; XAXIS; YAXIS) 
PHDR ;Q55;QS56;PT 
C1] ara DO NOT MOVE OR ERASE; GRAFSTAT FUNCTION HEADER 
[2] ARMA GRAFSTAT WILL NOT ADD A LINE TO THIS FUNCTION WITHOUT 
C3) AnA THIS HEADER 
C4) AA | 
(3) ann LOWESS CALLS THE FOLLOWING PROGRAMS AND VARIABLES: 
[46] ARA DATAINPUT; REFEATCK; PLOTQUERY; REGRES; REGRES2; RPLT; 
C7) Ana NRPLT; RESPLT; SRESPLT 
[8] Aan 
C9) OPP+tsd 
C19) DATAINPUT 
4041) ILOx1 (PROCEED#'N' ) 
49f{2) 49 
C13) LO: YitveYE4x] 
ara Mx unas | ORDER DATA 
C15) "INPUT F 1... CO<SFS1)' 
C14) Qt+LO.5+Qe (Ni tpx) xFeo 
C17) 'pO YOU WANT TO USE LINEAR OR QUADRATIC FITTING DURING : 
C18] "THIS SMOOTHING ROUTINE?’ 
C19] "(LIN OR QUAD) ' 
[20] REG+{(40 
F210] 'nO YOU WANT TO USE THE ROBUST SMOOTHING OFTION?' 
[22 'CYES OR NO)' 
(23) ROB¢140 
[24] YS+N1 96 
(25) WXtNi pf 


C26] J+o COUNTER FOR ROBUST SMOOTHING LOOP 
[27] Pe | e 


(28) 1+6 
£29] Ae ; STARTS FIRST STRIP AT X, ae Xa 
[30] BtQ 


6 
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C31] L2:IeI+{ INCREMEMENTS THROUGH X, ... Xn 
9(32] S9LEXICTONID - 
C33]  REPEATCK PREVENTS COMPUTATIONS OF Y,; FOR REPEAT X, 
9(34J] ALS X1CSKP='Y') 
(35] STRPe(At(O,1(B-A))) 
9(36]  - 9L3x104DEC/[Ue(XL1J°.-XESTRP]) | COMPUTES D, 
[37] YSCIJ#¢+/ OL ST/Y) 40 +/LSTHX=XCIJ>2 USES AVG Y IF D,=O 
9(38] 9L5 | 
C39] L3:WTeWXCSTRPIXTX¢( (1-¢ [U*3) #3) x0 |UEUFD) (1) TRICUBE WT FCN 
9(40] L4:9R2x1(REG#'L') 
C414] X{STRP] REGRES YCSTRP] WEIGHTED REGRESSIONS 
9(€42] 9L5 . 
C43] R2:XCSTRP] REGRES2 YCSTRP] 
9(44})) L5:9L2x1(BENidvCISN4) 
9C45] AL 2x10 (DAGCXLI +4 J-XCA]) )< (DBC XCB+i J-XCI+1]))) |ADVANCE STRIFE 
C46] AtAtt 
C47] BeB+{ 
9£48)] 9L5 
C49] Lé:ROFIRC4C | RERESY*(Y-YS))] 
4(5O0J] 9L410xK1(O4MEO.5x+/| (ROC (CP NI=2),1+LNI+2])) 
CSij] Uiet BICUBE WT FCN 
9(52] IL 4 f 
[53] LiO:U1eR=(6xM) 
C54] Lit: WXe((4-(€U1 #2) #20K 00 {UI C1) 
9(55]  - 9L 7X1 (CROR#'Y') 
9(546] 9Lixi(JUS2) 
(57] L7:PLOTQUERY RUN PLOTS 
C58] YSMTHeYS 
9C59] APLOxricPT#'y') 
9[ $0] A+0 
C64] L8:*THE OUTPUT FROM THIS LOWESS SMOOTHING IS STORED UNDER THE' 
C62] ‘FOLLOWING VARIABLE NAMES: ' 
C63] YSHTHinies oe SMOOTHED Y VALUES' 
C64] ' XA oenee eae X VALUES ARRANGED IN ASCENDING ORDER' 
(65] ' Tite en sa. ORIGINAI. Y VALUES! 
C46] !' RESY ........ RESIDUALS’ 


DATAINPUT controls the data entry portion of the proce- 


dure. Data and program operating parameters are entered in 


response to program gueries. DATAINPUT accepts data that is 


stored in APL workspace, 


the active 


transfers data fron 


other APL workspaces and converts CMS data into APL. 
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ech 


0 © 
@D wv 


ol 
FAP r= re 


bemed deed beet Aneel Nace bed heed deed hemecd heed eee eed heed heed heed dod dnd 


PA Pa r= eI FAI PA PPP Pe Pes eee ee 


nh 
& Ul > LL! HO—OODAH US WN = 


[40] 
C41) 
[42] 
[43] 
4(44] 
[45] 
[44] 
(47) 
(44) 
(47) 
(50) 
4[54) 
(52) 


(33) 
C34) 
(359) 


(56) 
(37) 


VARIABLE NAHE) ' 


DATAINFUT ;, QS1,;QS2,;QS94 
PROCEED 'Y'! 
4 4 


‘TS YOUR DATA SET LOCATED IN THIS WORKSPACE? ' 
'(YES OR NO)! 
QSitita 
ALFIKICQS{='N') 
venue THE NAHE OF THE X VAKIABLE' 
¢ 
"ENTER THE NAHE OF THE Y VARIABLE! 
Yeo 
4END 


LFi:'IS YOUR DATA LQCATED: ' 


' (4) IN AN APL WORKSPACE LOCATED ON THIS DISK OR ON A DISK' 
' THAT YOU ARE LINKED TQ, '! 

: (2) IN A CMS FILE ON THIS DISK OK ON A DISK THAT YOU AKE' 
‘ LINKED J0;' 

' (3) NEITHER (41) OR (2) ABOVE.’ 

"ENTER (1,2 OR 3)' 

QS2+0 

VIL F2,LFS,LP4)(QS2) 


LF2:'TO TRANSFEK YOUR DATA TO THIS WORKSPACE: ! 


; (1) TYPE ....PCOPY (WS NAME) (X VARIABLE NAHE) (Y 


EXAMPLE:  OFCOFY DATA X_ Y' 
TIF YOUK DATA IS STORED AS TWO SEPERATE YARIABLES' 
(2) TYPE ...)PCOPY (WS NAME) (VARIABLE NAHE)! 
EXAMFLE: 2dFCOFY DATA ARRAY' 
IF YOUR DATA IS STOKED UNDEK A SINGLE VARIABLE NAHME' 
AS IN A TWO DIMENSIONAL ARRAY' 


DATE AND TIME SAVED INFOKHATION JS DISPLAYED! 
WHEN THE TRANSFER ITS COMPLETE. THEN ENTER + GO 


=--UlUcemelmUmUClUCOlUlUlClCUCmUmUlClC Olle 


TQ CONTINUE THE LOWESS SHOOTHING FROCKAH'! 
SADATAINFUTEGO 


Z0:'DO YOU NEED TO DEFINE YOUR X AND Y YAKIABLES ANY FURTHER?! 


‘ANSWER NO IF YOU ENTERED SEPARATE X AND Y VAKIABLE NAHES' 
‘TIN THE PRECEDING STEP. OTHERWISE ANSWER YES.' 

"(YES OK NO)! 

QS3r1 th 

AENDKIVCQS3='N') 

‘DEFINE THE X YAKIABLE' 


" x*D 


"DEFINE THE ¥ VARIABLE! 
y+O 
4END 


LP3:'TO TRANSFER YOUK CHS DATA FILE TO THIS WORKSPACE: ' 


({) ANSWER THE FOLLOWING QUESTIONS ABOUT YOUR X DATA FILE' 
X+CHSKEAD 
(2) ANSWEK THE FOLLOWING QUESTIONS ABOUT YOUR Y DATA FILE' 


Y+CUSRKEAD 
‘YOU ARE NOW READY TO PROCEED WITH LOWESS' 


4END 


LP4:'YOUR DATA MUST BE STORED IN AN AFL WORKSPACE OR IN A CHS 
FILE! : 


‘LOCATED ON THIS DISK Of ON A DISK TO WHICH YOU ARE LINKED. — 


LOWESS' 


‘TS BEING TERMINATED. PLEASE COHPLY WITH CONDITION (1) OR (2) 


'AND REINITIATE LOWESS.' 
PROCEED+t'N' 


END: SADATAINPUTFO 
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REPEATCK reduces the number 


smooth a data set by assigning 


FLOTCUERY controls 
ating with 

It calls the 
of thea{X1— 1) 


inal data. 


of computations reguired to 


the same smoothed Y value to 


data points that have the same X value. 


-REPEATCK 
C9] REFEATCK 
oe SKPe'N' 
9C2] MENDX1CT24) 
9(3] MENDX1\¢(XCII¥X{CI-1)) 
C4] YSCIIJ6YSCI-1] 
C5] SKPe'y' 
C6] “BNb: 


the the graphical output 
the IBM GRAFSTSAT statistical 
sub pregram LOWS to smooth 


when oper- 
graphics package. 
the absolute value 


residuals obtained from smoothing the crig- 


‘*#*PLOTQUERY 

C9] PLOTQUERY 

C1] ' 8 

C2) ‘DO YOU WANT A PLOT OF YOUR LOWESS SMOGTHED CURVE?! 
C3] 'CYES OR NO) ..... ENTER NO IF NOT USING GRAFSTAT' 
C4] PT+({ 40 

9C5] S9ENDXICPTA'Y') 

C$] ‘INPUT X AXIS LABEL' 

Ea) XAXIS+¢O 

C8] "INPUT Y AXIS LABEL' 
C9] YAXIS+@ 

9C10] 9PLIxi (ROBH'Y') 
ii PHDRe "ROBUST LOWESS SMOGTHING; F = ',TF 
a 8 | RUN RPLT 

9(C13) +lee 
Ci4j} PLi:PHDR+'‘NON-ROBUST LOWESS SMOOTHING; F = ',TF 
EtSa RUN NRPLT 
C16} PL2:'DO YOU WANT A PLOT OF |RESIDUALS{| VS xX?! 
ial?) "CYES OR NOG) ' 
C18] QSS¢+1 40 

9C19] MENDX1I(QSS#'Y') 
[20] ‘DO YOU WANT THIS PLOT SMOOTHED? ' 
C24) "CYES OR NO)' 
i222 QSh6e1 40 

+623) 9PLSXiCQS46#'Y') 
C24] X LOWSC(|RESY) 
f25 4 RUN SRESPLT 

9C€246) XEND 
C27] PL3:RUN RESPLT 
(28) END: : 


71 


A 
LOWS 1s used to smooth the (Yi-Yi) residuals obtained 
from smocthing the 
like LOWESS 


Serctions. 


Original data set. It operates exactly 


except for the data input and graphical output 


#eLOWwS 
C9] X LOWS YiN4;Q;WX;J;1;A;B;Q;STRP;U;D;TX;W ,Z; ;DA;DB;R; .M; 
Bar Aernaa om jWT;Z;BR;DA;DB;R;U1 5M; 
CiJ YeYC4x]J 
eZ X*XC 4X] 
C3] RHLO.S#Qe (Ni epX) xF 
C4) YSEN1 99 
fo WXEN4 of 
C6] Jee 
C7] Li: Jel+{ 
C8} [¢9 
C9] At § 
C19] BeQ 
CHiy L2:leles 
92f£12) ML SX1 CIN) 
ers REPEATCK 
2f£14) PLSXVCSKPe'Y') 
C13] STRP#€( A#(9, 1 (BA) )) 
2C4S] PL SK 1 OMDEL/ (Ue (XC IJ] °.-XCSTRP) 
E{7] WTewXCSTRP IX TX+Qo 4 
-C18] YSCIJ€(+/CLST/Y )) $C +/LSTOXa=XCI +4 J 
2C19] 7L5 
ClO] LO: WTewXCSTRP]XTX#( (4 = {UeS) ) 43) x C [Ue ED) C4) 
2C21] L4:2R2x1(REGA'L') 
C22] XCSTRP] REGRES YCSTRP] 
*(€ 23) *LS 
C24] R2:XCSTRP] REGRES2 YCSTRP] 
SEaiG L5:2L2x1(BON4 )vCISNG) 
9£25 PL2K1( (DA&(XLI+4 J-XCA])<(DBe( +{J- + 
as og CA} I< XCB+i J-XCI+1]))) 
C29] BeB+4 
9€29] 9L5 
C39] L6:ROe(RC4( [REC Y-YS) 0] 
9031} PLIOKUCOAMEO Sx 4+/( CROC CIN1 +2), 1+LN1=2])) 
C32) Ut e4 
>£33] 7L if 
C34] Lid: UieR+(6xM) 
C34 Lit: WXe¢ (4-(U4 2) 2) KC (101004) 
2(€36] 2L{i2x1(ROBA'Y') % 
2€37) 9L4IX1CU¢2) 
ESQ], LZ: 


2 


REGRES computes linear least squares regressions of Y on 
X while REGRES2 computes quadratic least squares regressions 


of Yon kX. 


xREGRES 
XR REGRES YR;DEN; Wi; 84; B2 
t Dene (4/1 Dx C4 JUL xR #2S I= CC/XRRUE CUT HO, .5) #2) 
af? aL Ext CC] DEN) 20.0001) 
: 4 YSETJ#(+/YR)=PYR 
ROECCCH/WI) X CES CUEXXRXYR) ) —-CCE/WEXXR) XC4/U9 XYR) ) DEN 
z nee ee es ono x Ce eWt ex) + CL/WHD 
7 YSCIJERI+R2XXEI] 
KREGRES2 
CO Yo REGhe eye 
( Ate Ct/X2x(WEHO.5)) 
1 Bet ares to 3) 3 
¢- Z ” é : 
; Rane 33 p(¥/WTHO.5) AL, AZ, AT, AZ, A3,A2,A3, (4+/ (X24) x(UTHO.5)) 
5 PHS 2 CEZYIRWIT HOLS) » CH/K2XVOXWTHO.5S) 
6 RUSZE A BRUS2, C+/(X2H2) XV2RUT HO. 5) 
2 IR¢ RHS 2HARS 
g YSCIJEBRE $34 74+(BRO2;17xXC1])+(BRO3; §xXCT]*2) 


The following character strings are the screen vectors 
used by the RUN function of GRAFSTAT to produce the flots of 
the LCWESS smoothe curves of the original data and aktsolute 


value of the residuals. 


‘MANEPLT ¢3 CHARACTER : 
AAOXIGVYI;YS9O 1F19. REX VOO84HS' 'GFHDROXAXISGYAXISG2ZIGLINGLINGI { 190 1 
6 6 


HRRESPLT 80 CHARACTER 
AAV EXOCIRESY)9OR1G. #txVacedeg' 'G' 'YEXAXISG' [RESIDUALS | ' Y229LINGLINGI { 
{99 1 0 08 
HERPLT 73 CHARACTER 


R4UXIGVI,YSHO 1919. 4+ K760NhRG' 'GPHDROXAXISGYAXISYZIOLINGLINGS § 190 1 
a) 


¥¥SRESPLT 85 CHARACTER 
AALYXOCPRESY) )YSYO 
1010. #+KVAO@APR' 8! 'YXAXISH! Pear otant V229LINGLINGI 4 1490 1 0 
08 
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APPENDIX B 


FORTRAN PROGRAMS 


This appendix contains a listing of the FORTRAN progran 
and subroutine written to support this thesis. eS 
PEOQdedts,.) LLUBOF and PXSORT, used to support the LCWESS 
program are not listed. Detailed user instructions for oper- 


ating these programs are contained in Chapter V. 


$JOB C 
REAL 
KX (200), ¥ (200), YS (200) 7200*0. 0/, WX (200) /200*1.0/,A (2,2) ,B(Z,1 
70 (209 7200*0.07,D,U T3006) 720080. 0/ W230 Of 726046604 
CoWR(22),DA,DB 7 § (205) 7200007, 81 (209) 7200.07, 80, F6C (4) 
Gee (2-1) HED 
AX,BX,A1¢0,11,12,13, 14, 15,16 ,17,18,19,110,N, 19K (2) , TER, HOF 
Gaee2 C 
BATA’ AX/1/ ,ROB/-1/,N/0/ C 
F=.33 
TF i=2 
IF2=4 
1 ae 
N=N+ 
READ (IF1,901, END=2) X(N) ,Y(N) 
GO TO 1 
2 N=N-1 
CALL XYSORT(X,Y,1,N) 
Q=TFIX ( (FLOAT (NJ*F) +. 5) 
4u CCNTIN 
AX=1 
2 1={AX-1) 
BX=0 
DO 65 I1=1,N 
T2=0 
D=0.0 
DO 10 I3=Ax,EX 
W713) ex (11) Kacey) 
IP (; NOT. ABS (U{Z2)) -GE-D) GO TO 5 
D=ABS (U(12)) 
IF (.NOT.D.G1.0.00001) GO TO 30 
5 ae iidon 0 
LF (LNCt. 1-49-41. 0) GO TO 15 
TX (14 = (1.07 (U14%3) ) #43 
wr (Taj =TX (14) ¥WX (A 1414) 
Conon 20 
15 CONTINUE 
TX (14) =0.0 
WT {14 =0.0 
20 CONTINUE 
25 CONTINUE 
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K=K-1 

ae oLTs Auth ie) eCOs iO Seo 
A (K+ 1)sT 

B (KS 1} =71 

GO TO 90 

END SENTRY 


_ The following LCWS EXEC routine sets the file defini- 
tions and invokes the appropriate systems libraries required 
i oak peg > LOWESS. This routine is executed by typing "LOWS 


GLCBAL MACLIB I*SSLSP NONIMSL 

FILEDEF O02 DISK LOWZ DATA Aer 
FILEDEF (0S Disk GOtsy DAT teen. 
FILEDEF O08 DISK LOWS DATA A (PERM 
FILEDEF QO7 DISK DOW) DAD ae eee 
FILEDEF 08 DISK LOW8 DATA A (PERM 


oe 


DATA SETS 


This appendix contains four data sets that were used to 
compare LOWESS with MCVING AVERAGE, Costin AneGigeand LEAST 
SQUARES REGRESSION rooutines in Chapter III. They include: 


ee 


2s 


os ONS ss used to test LOWESS' ability to 
detect and follcw linear trends. 

TEST SET TWO ... used to check LOWESS‘' performance on 
data sets that contain abrupt changes in curvature. 
ieSeo levis eo uSeameto tests LOWESS* ability to 
fcllow smooth changes in curvature. 

Lag-1 points from NEAR(1) data ... used to check 


LCWESS' performance on unequally spaced data. 
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X 
~200 
400 
690 
. 890 

4.000 
{.209 
{.400 
{.600 
{.800 
2.990 
2.200 
2.409 
2.600 
2.890 
3.900 
3.209 
3.400 
3.600 
3.800 
4.900 
4.200 
4.409 
4.600 
4.800 
5.000 
3.209 
5.400 
2.409 
>.800 
6.000 
6.200 
&.4909 
6.4600 
6.800 
7.900 
7.200 
7.400 
7.699 
7.800 
8.909 
8.200 
8.409 
8.400 
8.809 
9.900 
JOO 
9.400 
9.600 
9.800 
{9.9000 


ao 7G 
~,841 
~ 4103 
{.156 
{.653 


1.416. 


1.136 
3.402 
1 aileove 
ene 
1.481 
2.821 
~669 
3.460 
{.897 
3.097 
2.340 
2.361 
1.911 
3.926 
4.412 
4.893 
6.147 
5.445 
2.852 
4.171 
3.258 
3.073 
5.487 
35.406 
6.532 
S959 
7.9500 
6.599 
6.766 
8.4659 
9.236 
aa 
W275 
7.935 
S2o7 
9.1465 
8.905 
8.939 
9.032 
B25 15 
8.840 
11.480 
8.796 
7.203 


TABLE IX 
Data Set One 


X 
190.200 
{9.400 
{9.600 
{0.800 
11.900 
14.200 
11.400 
{1.609 
11.800 
122,000 
12.200 
{2.4900 
{2.600 
{2.800 
13.900 
132200 
{3.400 
{3.600 
13. 800 
14.900 
14.200 
14.400 
{4.600 
14.800 
{5.900 
{5.200 
{5.400 
15.600 
{5.800 
16.900 
ro. 200 
14.400 
16.409 
{6.8090 
{7.000 
17.2090 
17.400 
{7.600 
{7.800 
18.9900 
{8.200 
18.400 
18.4600 
18.800 
19.000 
feo 
19.400 
17.600 
19.800 
29.900 


US 


Y 
8.696 
{9.305 
10429 7 
10.273 
14.345 
19.477 
12.4668 
11.5469 
tesocg 
14.180 
12.638 
PSS 
12.851 
12.490 
{22007 
12.815 
{4.558 
14.463 
{2.765 
13.807 
{2.900 
14.707 
192567 
{4.953 
Vas 
\oeao@ 
18.607 
16.1364 
{6.098 
{6.284 
17.160 
18.488 
18225 
14.605 
{7.017 
17.446 
16.546 
{8.758 
17.962 
19.557 
18.9006 
290.051 
16.701 
29.623 
17.482 
18.149 
19.450 
18.145 
20.267 
29.545 


X 
20.200 
20.400 
20.4600 
20.809 
21.000 
21.200 
21.400 
21.600 
21.800 
22.000 
Par BEEN, 
22.400 
22.000 
22.800 
23.900 
2oneug 
23.400 
23.699 
23.800 
24.9900 
24.290 
24.490 
24.600 
24.809 
23.000 
22.2909 
25.400 
25.609 
25.800 
26.900 
26.200 
26.400 
26.609 
26.800 
27.000 
212200 
27.400 
27.600 
2?.800 
28.009 
28.200 
28.409 
28.600 
28.800 
29.900 
27 200 
29.400 
29.600 
29.800 
30.9009 


Y 
21.520 
17.974 
21.018 
21.047 
21.704 
2i.go2 
20.408 
ao. 500 
21.418 
21.089 
21.204 
25.979 
22.441 
29.594 
22.802 
23.059 
23.811 
22.421 
Loa e2 
eendi? 
25.249 
24.703 
Bomoes 
24.870 
24.693 
26.589 
26.764 
26.23¢ 
26.2971 
26.801 
25.433 
26.764 
Ono] 
27.664 
26.922 
29.974 
21 coe 
29.872 
27.765 
26.499 
202969 
28.201 
ee 
27 ae 
2? ate 
28.834 
30771 t 
28-5802 
28.863 
27 es 


Y 
~.,462 


eet Vi 


{.405 
~947 
475 
~832 

~ a tee 

2.526 
~t?9 

2samd 

{.144 

lage 

~.496 
419 

2.446 
641 

{1.937 

{.980 

{.384 
~29f 
~419 

2.745 

{.795 

fie 

1.235 

22942 

2.194 

2.793 

acne 

3.156 

2.880 

a 

3.9015 

3.845 

3.529 
2993 

2.486 

2. Cae 

3.438 

2.689 

Seo 

4.967 

4.288 

3.788 

2.477 

3.610 

3.703 

3.283 

3.583 

4.4415 

5.578 

2576 

C2762 

3.203 

4.482 


Xx 


2400 
- 400 
eae 
. 
Bos 

2. ne 
{2.400 
{2.800 
{3.900 
{3.200 
13.400 
13.400 
13.800 
14.900 
{4.200 
14.400 
{4.400 
14.800 
{5.9000 
15.200 
{5.400 
{5.4600 
{5.800 
{6.990 
{6.200 
{6.499 
16.499 
{4.800 
17.000 
{7.200 
{7.400 
{7.400 
{7.800 
18.000 
{8.200 
{8.409 
{8.400 
{8.800 
{9.900 
19.200 
19.400 
19.609 
{9.800 
29.090 
20.200 
20.409 
20.400 
20.800 
21.009 
21.200 
21.4090 
21.600 
21.800 
22.009 


iy eee we 
at af af, ath 
® 


TABLE X 


Y 
3.849 
4.554 
a Oe 
3.159 
4.518 
52736 
4.989 
S./52 
Be o5 
4.052 
3.594 
3.895 
3.747 
4.171 
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